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LgTOje iie coding for a calcium dependent protease 

The invention relates to the isolated gene coding for a calcium dependent 
protease belonging to the Calpain family which, when it is mutated, is a cause of 
a disease called Limb-Girdle Muscular Dystrophy (LGMD). 

The term limb-girdle muscular dystrophy (LGMD) was first proposed by 
Walton and Nattrass (1954) as part of a classification of muscular dystrophies. 
LGMD is characterised by progressive symmetrical atrophy and weakness of the 
proximal limb muscles and by elevated serum creatine kinase. Muscle biopsies 
demonstrate dystrophic lesions and electromyograms show myopathic features. 
The symptoms usually begin during the first two decades of life and the disease 
gradually worsens, often resulting in loss of walking ability 10 or 20 years after 
onset (Bushby. 1994), Yet. the precise nosological definition of LGMD still 
remains unclear. Consequently, various neuromuscular diseases such as 
facioscapulohumeral. Becker muscular dystrophies and especially spinal 
muscular atrophies have been occasionally classified under this diagnosis. For 
example, a recent study (Arikawa et al.. 1991) reported that 17% (out of 41) of 
LGMD patients showed a dystrophinopathy. These issues highlight the difficulty 
in undertaking an analysis of the molecular and genetic defect(s) involved in this 
pathology. 

Attempts to identify the genetic basis of this disease go back over 35 years. 
Morton and Chung (1959) estimated that 'the frequency of heterozygous carrier 
... is 16 per thousand persons". The same authors also stated that 'Ihe 
segregation analysis gives no evidence on whether these genes in different 
families are allelic or at different loci" Both autosomal dominant and recessive 
transmission have been reported, the latter being more common with an 
estimated prevalence of 10-5 (Emery. 1991). The localisation of a gene for a 
recessive form on chromosome 15 {LGMD2A. MIM 253600; Beckmann et al.. 
1991) provided the definitive proof that LGMD is a specific genetic entity. 
Subsequent genetic analyses confirmed this chromosome 15 localisation (Young 
et al.. 1992: Passos-Bueno et al.. 1993). the latter group demonstrating genetic 
heterogeneity of this disease. Although a recent study localised a second mutant 
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gene to chromosome 2 (LGMD2B. MIM 253S01; Bashir et ai.. 1994). there is 

evidence that at least one other locus can be involved. 

Genetic analyses of the LGMD2 kindreds revealed unexpected findings 

First genetic heterogeneity was demonstrated in the highly inbred Indiana Amish 
5 community. Second although the Isle of la Reunion families were thought to 

represent a genetic isolate, at least 6 different disease haplotypes were 

Observed , providing evidence against the hypothesis of a single founder effect 

(Beckmann et al., 1991) in this inbred population. 

The nonspecific nosological definition, the relatively low prevalence and 
genetic heterogeneity of this disorder limit the number of families which can be 
used to restrict the genetic boundaries of the LGMD2A interval. Cytogenetic 
abnormalities, which could have helped to focus on a particular region have not 
been reported. Immunogenetic studies of dystrophin-associated proteins 
(Matsumura et al.. 1993) and cytoskeletal or extracellular matrix proteins such as 
a merosin (Tome et al.. 1994) failed to demonstrate any deficiency. In addition 
there is no known specific physiological feature or animal model that could help 
to Identify a candidate gene. Thus, there is no alternative to a positional cloning 
Strategy. ^ 

It is establlsheo that .he LGMD2 chromosomal region ,s locaHzed Oh 
Chromosome 1 5 as 1 5q1 5, 1 - 1 5q21 . 1 region ( Fougerousse et al 1 994) 

Construction and analysis of a 10-12 Mb YAC contig (Fougerousse et al 
994) permitted the mapping of 33 polymorphic markers within this interval and 
to further narrow the LGMD2A region to between D15S514 and D1SS222 
Fuhhem^ore. extensive analysis of linkage disequiUbrium suggested a likely 
position for the gene In the proximal part of the contig. 

The invention results from the construction of a partial cosmid map and the 
screening by cDNA selection (Lovett e, a,.. 1991 ; Tagle et al., 1993, for muscle- 
expressed sequences encoded by this interval led to the identification of a 
number of potential candidate genes. One of these, previously cloned by 
Sonmach, et al. (1989). encodes a muscle specific protein, nCLI (novel Calpain 
U e subuni, 1), Which belongs to the calpain family (CANP. calcium-activated 
neutra protease* EC ^ a '^o a^"veiieu 

.or this disease ^ ' '° ' "^""^'^^ ^ene 
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Calpains are non-lysosomal intracellular cysteine proteases which require 
calcium for their catalytic activities (for a review see Croall D.E. et al. 1991). The 
mammalian calpains include two ubiquitous proteins CANP1 and CANP2 as well 
as tissue-specific proteins. In addition to the muscle specific nCLI. stomach 
i specific nCL2 and nCL2' proteins have also been described: these are derived 
from the same gene by alternative splicing. The ubiquitous enzymes consist of 
heterodimers with distinct large subunits associated with an common small 
subunit : the association of tissue-specific large subunits with a small subunit 
has not yet been demonstrated. The large subunits of calpains can be 
subdivided into 4 protein domains. Domains I and III. whose functions remain 
unknown, show no homology with known proteins. Domain I. however, seems 
important for the regulation of the proteolytic activity. Domain II shows similarity 
with other cysteine proteases, sharing histidine. cysteine and asparagine 
residues at its active sites. Domain IV comprises four EF-hand structures which 
are potential calcium binding sites. In addition, three unique regions with no 
known homology are present in the muscle-specific nCLI protein, namely NS. 
IS1 and IS2. the latter containing a nuclear translocation signal. These regions 
may be important for the muscle specific function of nCLI . 

It is usually accepted that muscular dystrophies are associated with excess 
or deregulated calpains, and all the known approaches for curing these diseases 
are the use of antagonists of these proteases ; examples are disclosed .n EP 
359309 or EP 525420. 

The invention results from the finding that, on the opposite to all these 
hypothesis, the LGMD2 disease is strongly correlated to the defect of a calpain 
which IS expressed in healthy people. 

The invention relates to the nucleic acid sequence such as represented 
Figure 2 coding for a Ca^ dependent protease, or calpa.n. which is involved .n 
LGMD2 disease, and more precisely LGMD2A. It also relates to a part of this 
sequence provided it is able to code for a protein having a calcium-dependent 
protease activity involved in LGMD2, or a sequence derived from one of the 
above sequences by substitution, deletion or addition of one or more nucleotides 
provided that said sequence is still coding for said protein, all the nucleic acids 
yielding a sequence complementary to a sequence as defined above 
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The genomic organisation of the human nCLI gene has been determined by 
the inventors, and consists of 24 exons and extends over 40 kb as represented 
in Figure 8, and is also a part of the invention. About 35 kb of this gene have 
been sequenced. A systematic screening of this gene in LGMD2A families led to 
the identification of 14 different mutations, establishing that a number of 
independent mutational events in nCL1 are responsible for LGMD2A 
Furthermore, this is the first demonstration of a muscular dystrophy resulting 
from an enzymatic rather than a structural defect. 

In the present specification. CANP3 means the protein which is a Ca** 
dependent protease, or calpa.n. and coded by the nCLI gene on chromosome 
15. 

The invention relates also to a protein, called CANP3. consisting in the 
amino acid sequence such as represented in figure 2 and which is involved, 
when mutated, in the LGMD2 disease. 

The cDNA of the gene coding for CANP3. vi^ich is coding for the protein, is 
also represented in Figure 2. and is a part of the invention. 

The protein coded by this DNA is CANP3. a calcium-dependent protease 
belonging to the Calpain family. 

Are also included in the present invention the nucleic acid sequences 
denved from the cDNA of Figure 2 by one or more substitutions, deletions 
.nsertions, or by mutations ,n 5' or 3' non coding regions or in splice sites 
provided that the translated protein has the protease, calcium-dependent 
activity, and when mutated, induce LGMD2 disease. 

The nucleic acid sequence encoding the protein might be DNA or RNA and 
be complementary to the nucleic and sequence represented in Figure 2. 

The invention also relates to a recombinant vector including a DNA 
sequence of the invention, under the control of a promoter allowing the 
expression of the calpain in an appropriate host cell. 

A procaryotic or eucaryotic host cell transformed by or transfected with a 
DNA sequence comprising all or pari of the sequence of Figure 2 is a part of the 
invention. 

Such a host cell might be either : 
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- a cell which is able to secrete the protein and. this recombinant protein 
might be used as a drug to treat the LGMD2, or 

- a packaging cell line transfected by a viral or retroviral vector : the ceil 
lines beahng recombinant vector might be used as a drug for gene therapy of 

5 LGMD2. 

All the systems used today for gene therapy including adenoviruses and 
retroviruses and others described for example in « I'ADN medicament », (John 
Libbey, Eurotext. 1993), and bearing one of the DNA sequence of the invention 
are included herein by reference. 
I) The examples hereunder and attached figures indicate how the structure of 

the gene was established, and how relationship between the gene and the 
LGMD was established. 




20 



25 



Legend of the figures : 

re 1: 




^) Genomic organisation of the nCLI gene 

e gene covers a 40 kb region of which 35 were sequenced (Accession 
number ^^ding). Introns and exons are drawn to scale, the latter being 
indicated bViumbered vertical bars. The first intron is the largest one and 
remains to be W sequenced. Position of intragenic microsatellites are indicated 
by asterisks. Arr^^^s indicate the orientation of Alu (closed) and of Mer2 (greyed) 
repeat sequences. 

B) EcoRl restriction map 
An EcoRI (E) restriction map of this region was established with the help 

of cosmids from this region. The location of nCLI gene is indicateo as a black 
bar. The size of the corresponding fragments are indicated and are underlined 
when determined by sequence analysis. 

C) Cosmid map of the nCLI gene region. 

Cosmids were fr^ a cosmid library constructed by subcloning YAC 
774G4 (Richard in preparation) and are presented as lines. Dots on lines 
indicate positive STSs (indicated in boxed rectangles). A minimum of three 
cosmids cover the entire ge\e. T3,T7 
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^ FjmeZ. Sequence of the human nCLI cDNA (B) , andjh^ilanWngTJ^^ 
y (C) genomic regions. ^ 

j)' A) and C) The polyadenylati^ri-srgn^^ putative CAAT, TATAA sites are 
boxed. Putativej5i.(pt5sitiSn -477 to -472). MEF2 binding sites (-364 to -343) 
5 and CAr^W(:^85 to -672) are in bold. The Alu sequence present rn the 5' 
region is underlined. 

B) The corresponding amin>^<js are shown below t^ie%|?en2e''^he coding 
sequence between the ATG Initi^ codon and the TGA stop codon ,s 24S6 bp 
encoding for a 821 amino acid protei\rhe adenine In the first methionine codon 
10 has been assigned position 1. Locatio\of Introns within the nCLI gene are 
indicated by arrowheads. Nucleotides whlcWr from the previously published 
ones are indicated by astensks. \ 

Eiaurg^ Alignments of amino acid sequences of the muscle-specific calpains 

The human nCLI protein Is shown on the first line. The 3 muscle-speclfic 

raTs!"'" TsM'ii'iirr'"^' -""^ -responds to the 

ra. sequence (Accession no^Pj. T^!^ third and founh lines show the deduced 

ammo acid sequences encoded by,gg and bovine Expressed Sequences 
Tagged (GenBanK accession no UoS^sTei^d^^obefesffcif t" 
ammo acids residues which are conserved among all known members o, the 
^" calpains are In reverse letters. A period indicates that the same ammo acid ,s 
present m the sequence. Leners refer to the variant ammo acid found in the 
homologous sequence. Position of missense mutations are given as numbers 
above the mutated amino acid. 

Eiaure^^Dlstrlbution \, me mutations along nCLl protein structure 
- A) Positions Of the 23 mtrons are indicated by venical bars m relation to the 

corresponding amino acid coordinates. 

B) The nCLI protein is depicted showing the four domains (I, II Ml IV) and 
the musce specific sequences ,NS. IS, and .S2,. The positi n of ml sen e 
mutations Within nCL1 domain are Indicated by blacK dots. The effeTof 
honsense and framesh«t mutations are Illustrated as trunJted Is 
representing the extent of protein synthesised Name of thl IT T 

:::: - r- - - - ^ - - - - ot^ar ; =Tr : 
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Figure 5 : Northern blot hybridisation of a nCLI clone 

A mRNA blot (Clontech) containing 2 pg of poly(A)+ RNA from each of 
eight human tissues was hybridised with a nCLI genomic clone spanning exons 
20 and 21. The latter detects a 3.6 kb mRNA present only in a line 
corresponding to the skeletal muscle mRNA. 

Figure 6: Representative mutations identified by heteroduplex analysis. 

Examples of mutation screening by heteroduplex analysis. Pedigree B505 
shows the segregation of two different mutations in exon 22. 
Figure 7: H()^o2ygous mutations in the nCL1 gene 

Detection by sequencing of mutations in exons 2 (a), 8 (b), 13 (c) and 22 
(d). Sequences from a healthy control are shown above each mutant sequence. 
Asterisks indicate the position of the mutated nucleotides. The consequences on 
codon and amino acid residues are indicated on the left of the figure together 
with the name of the family. 
Fioute 8 ; Structure of nCLI gene 

Figure 8A represents the 5" part of the gene with exon 1 . ^ 
Figure SB^represents the part of the gene inpluding exons^^ 
Figure .8C_represents the part of the gene irVd^ngexorfs,^ ^ 
Figure 8D_represems the part of the gene including exons 10 to 24 
including the 3' non transcribed'region. 
EXAMPLES 
EXAMPLE 1 

Localisation of the nCL1 within the LGMD2A interval 

Detailed genetic and physical maps of the LGMD2A region were 
constructed (Fougerousse et al.. 1994). following the primary linkage assignment 
to 15q (Beckmann et al.. 1991). The disease locus was bracketed between the 
D15S129 and D15S143 markers, defining the cytogenetic boundaries of the 
LGMD2A region as 15q15.1-15q21.1 (Fougerousse et a!.. 1994). Construction 
and analysis of a 10-12 Mb YAC contig (Fougerousse et al.. 1994) permitted us 
to map 33 polymorphic markers within this interval and to further nan-ow the 
LGMD2A region to between D15S514 and D15S222. 
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The nCLI gene had been localised to chromosome 15 Dy hybndisation with 
sorted chromosomes and by Southern hybridisation to DNA from human-mouse 
cell hybrids (Ohno et al.. 1989).cDNA capture using YACs from the LGMD2A 
interval allowed the identification of thirteen positional candidate genes. nCL1 
5 was one of the two transcripts identified that showed muscle-specific expression 
as evidenced by northen blot analysis. The localisation was further confirmed by 
STS (for Sequence Tagged Site) assays. Primers used for the localisation of the 
nCLl gene are P94in2. P94in13 and pcr6a3. as shown in Figure 1 and their 
characteristics being defined in Table 1 . 
10 Table 1: PGR primers used for localisation of the nCLI gene. 

Primer name Primer sequence (5 "-3*) Position Annealing PCR produa size on 

P within the lemp CO 

Dli cDNA cDNA eenomic DNA 

Ki ^ P94m2 ATGGAGCCAACAGAACTGA 341-360 58 108 1758 

W C C CSBa XP no: Ib) 428-448 

B'l CtATGACTCGGAAAAGAAG 

SJ GT/Sg« Xi> aJ<5: lO 

Ul P94ml3 TAAGCAAAAGCAGTCCCCA 1893-1912 58 64 1043 

4: C Ql^t& IZ-P POMXJ 1936-1956 

f TTGCTGTTCCTCACTTTCCT 

Ki P'ii-bilS GTTTCATCTGCTGCTreGTr, 2342-2361 

^ CTGGTTCAGGCATACATGG^ 2452-2471 

\(. tISB^ XD /oo: 10 

P94e.Mtcr TTCTJTATGTGGACCCTGAG 218-239 

Ac6AACTGGATGGGGAVCcr ^ 



56 130 818 

55 76 76 



These phmers are designed from different parts of the published human 
cDNA sequence (Sorimachi et al.. 1989). and were used for an STS content 

15 screening on DNA from three chromosome 15 somatic cell hybrids and YACs 
from the LGMD2A contig The results positioned the gene in a region previously 
defined as 15q15.1-q21.1 and on 3 YACs (774G4, 926G10. 923G7) localised in 
this region. The relative positions of STSs along the LGMD2A contig allowed to 
localise the gene between D15S512 and D15S488. in a candidate region 

20 suggested by linkage disequilibrium studies. 

The same primers as above were used to screen a cosmid library from YAC 
774G4. A group of 5 cosmids was identified (Fig. 1). Experiments with another 
nCLI primer pair (P94ex1ter, Table 1) established that these cosmids cover all 
nCLI exons except numberi, and that a second group of 4 cosmids contain this 
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exon (Fig. 1). A minimal set of three overlapping cosmids (2G8-2B11-1F11) 
covers the entire gene (Figure 1). DNA from these cosmids was used to 
construct an EcoRI restriction map of this region (Figure IB). 
EXAMPLE 2 

Determination of the nCL1 gene sequence 

Most of the sequences were obtained through shotgun sequencing of partial 
digests of cosmid 1F11 subcloned in M13 and bluescript vectors, and by walking 
with Internal primers. The sequence assembly was made using the XBAP 
software of the Staden package (Staden) and was in agreement with the 
restriction map of the cosmids. Sequences of exon 1 and adjacent regions were 
obtained by sequencing cosmid DNA or PGR products from human genomic 
DNA. The first intron is still not fully sequenced, but there is evidence that it may 
be between 10 to 16 kb in length (based on hybridisation of restriction fragments; 
data not shown). The entire gene, including its 5' and 3' regions, is more than 40 
kb long, and shown in Figure 8. 

a) the cDNA sequence 

The used technology allows the implementation of the published human 
cDNA sequence of nCLI (SorimachI 1989). It contains the missing 129 bases 
corresponding to|he N-terminal 43 amino acids (Figure 2). It also differs from it 
at 12 positions^ ./hree of which occur at third base positions of codons and 
preserve the encoded amino acid sequence. The other 9 differences lead to 
changes in amino-acid composition (Figure 2). As these different exons were 
sequenced repeatedly on at least 10 distinct genomes, we are confident that the 
sequence of Fig. 2 represents an authentic sequence and does not contain 
minor polymorphic variants. Furthermore, these modifications increase the local 
similarity with the rat nCLI amino acid sequence (Sorimachi). although the 
overall similarity is still 94 %. 

The ATG numbered 1 in Figure 2 is the translation Initiation site based on 
homology with the rat nCLI . and is within a sequence with only 5 nucleotides out 
of 8 in common with the Kosak consensus sequence (Kosak M, 1984). Putative' 
CCAAT and TATA boxes were observed 590, 324. (CCAAT) and 544 or 33 bp 
(TATA) upstream of the initiating ATG codon, respectively (Bucher. 1990). A GC- 
box binding the Spl protein (Dynan et al.. 1983) was identified at position -477. 




wo 96/16175 A W PCrAEP95/04575 



10 



Consensus sequences corresponding to potential muscle-specific regulatory 
elements were identified (Fig. 2). These include a myocyte-specific enhancer- 
binding factor 2 (MEF2) binding site (Cserjesi P. 1991), a CArG box (Minty A. 
1986) and 6 E-boxes (binding sites for basic Helix-Loop-Helix proteins frequently 
found in members of MyoD family; Blackwell et Weintraub, 1990). The functional 
significance of these putative transcription factor binding sites in the regulation 
of nCL1 gene expression remains to be established. 

Two potential AAUAAA polyadenylation signals, were identified 520 and 777 
bp downstream of the TGA stop codon. The sequencing of a partial nCLI cDNA 
containing a polyA tail, demonstrated that the first AAUAAA is the 
polyadenylation signal. The latter is embedded in a region well conserved with 
the rat nCLI sequence and is followed after 4 bp by a G/T cluster, present in 
most genes 3' of the polyadenylation site (Bimstiel et al.. 1985). The 3'- 
untranslated region of the nCLI mRNA is 565 bp long. The predicted length of 
the cDNA should therefore be approximately 3550 or 3000 bp. 

b) Comparison with calpain 

The sequence of the human nCLI gene was compared to those of other 
calpains thereof (Figure 3). The most telling comparisons are with the 
homologous rat (Accession no J05121). bovine (Accession no U07858) and 
porcine (Accession no U05678) sequences. The accession numbers refers to 
those or international genebanks, such as GeneBank (N.I.H.) or EMBL Database 
(EMBL. Heidelberg). High local similarities between the human and rat DNA 
sequences are even observed in the 5' (75%) or in different parts of the 3" 
untranslated regions (over 60%) (data not shown). The high extent of sequence 
homology manifested by the human and rat nCLI gene in their untranslated 
regions is suggestive of evolutionary pressures on common putative regulatory 
sequences. 

c) Genomic organisation of the nCLI gene 

A comparison of the published nCLI human cDNA (Sorimachi et al., 1989) 
with the corresponding genomic sequence led to the identification of 24 exons 
ranging in length from 12 bp (exon 13) to 309 bp (exon 1). with a mean size of 
100 bp (Figure 1). The size of introns ranges from 86 bp to about 10-16 kb for 



intron 1 
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The intron-exon boundaries as shown-in Table 2 exhibit close adherence to 
5' and 3* splice site consensus sequences (Shapiro and Senapathy. 1987). 
Table 2: Sequences at the intron-exon junctions, A score expressing adherence 
to the consensus wak calculated for each site according to Shapiro and 
Senapathy (1987). Sequences of exons and introns are in upper and lower 
cases, respectively. Size oVexons are given in parenthesis. 



splice donor site 



score Intron 
(%) 



score splice acceptor site 

f%) 



E\on 



-CTCCGgigagi... 88.5 

•GCTAGgiagga... 83,5 

-TCCAGgigagg... 92 

...GCTAAgiaagc... 82 

...TTGATgtaagi... 87 

CCCGGgigigi.. 77.5 

ATGAGgtancc. 94 

GATAGgiagci.. 89 

TTCTGgicact. 88 

CCCAGgiggga.. 80 

ACGAGgigtgi.. 85.5 

AAGAGgiaiag... 70 

-TCTGAgigagi... 76,5 

..CAGTGgigagi... 89 

. -CC AAGgiaggi.,. 89 

..CACAGgigtci., 80 

GAGATgigagi.. 84 

CAAACglgagi. 83 

TGGATgiaicc. 56 

-GGCAGpggga... 80 

CGCAGgigcig.. 66 



<-lntron l-> 99.0 ...tttllgtncacagGAAAT... 

<-lniron 2-> 90.0 ...gigtctgcctgcagGGGAC. 

<-lmron3-> 81.5 ...acgcitaglgcagTTCTG.. 

<-lniron4-> 81.5 ...atcaaactaagGCTCC... 

<-lniron 5-> 79.5 ...ccatcgggcacagGATGG 

<-lniron 6-> 91 ..itacigacucagACAAT. . 

<-lniron 7-> 78.5 .^.lagtgigrtiaagGTCCC ... 

<-lmron 8-> 91.5 ...caitticccaccagATGGA . 

<-Intron 9-> 92 ...tlccaacaacagGATGT,.. 

<-Iniron 10-> 68.5 ...tiaggggglgcagATACT.. 

<-Iniron l)-> 86 ...igiitaiacaagGTTCC... 

<-lniron 12-> 87 ...iccccatciacagATGCA... 

<-lmron !.>-> 97 ...igiaticacacagGGAAG... 

<-lmron l4-> 93.5 ...atlicttaigcagAAAAA.. 

<-Imron I5.> 87 ...cacctciaccagCCCAT... 

<-lmron 16-> 88 ...tigigcaccacagCCACA.., 

<-lniron I7.> 92.5 ...cccncacacagGACAT... 

<-Imron 18-> 90 ...accaiccccccagACAAG.* 

<-lniron )9-> 88 ...cacccicaccagACAGA... 

<-intron 20-> 94 ...ttiictatigccagAAATA... 

<-lmron2J-> 91 ...ggtcccaccacagGATTC... 



Exon 1 (309 bp) -> 
Exon 2 (70 bp) -> 
Exon 3 (119bp)-> 
Exon 4 (134 bp) -> 
Exon 5 (169bp)*> 
Exon 6 (144 bp) -> 
Exon 7 (84 bp) -> 
Exon 8 (86 bp) -> 
Exon 9 (78 bp) -> 
Exon 10 (161 bp) •> 
Exon II (I70bp)-> 
Exon 12 (12bp)-> 
Exon 13 (209 bp) -> 
Exon 14 (37 bp) -> 
Exon 15 (18 bp) -> 
Exonl6 (114 bp) •> 
Exon 17 (78 bp) -> 
Exon 18 (58 bp) -> 
Exon 19 (65 bp) -> 
Exon 20 (69 bp) *> 
Exon 21 (79bp)-> 
Exon 22 (117 bp) -> 
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...GTTCAgiaagi... 79 



<-imron 22-> 



93.5 



...gcaiictiicacagGAGCT... Exon 23 (59 bp) -> 



...TGGAGpiaaag... 81 



<-Imron 23-> 



79 



...gggaataucagTGGCT.. Exon 24 (27 bp) -> 



When the genomic sequence was submitted to GRAIL analysis (Uberbacher 
et aL, 1991). 11 exons were correctly recognised, 4 were not Identified. 6 were 
inadequately defined and 2 were too small to be recognised (data not shown). 



{amino acid residues 1 to 61). IS1 (residues 267 to 329) and IS2 (residues 578 
to 653). It is interesting to note that each of these sequences, as well as the 
nuclear translocation signal inside IS2. are essentially flanked by Introns (Fig. 4). 
The exon-intron organisation of the human nCLI is similar to that reported for 

10 the chicken CANP (the only other large subunit calpain gene whose genomic 
structure is known; (Emori et al., 1985). 

Four microsatellite sequences were identified. Two of them are in the distal 
part of the first intron: an (AT)14 and an previously identified mixed-pattern 
microsatellite. S774G4B8, which was demonstrated to be non polymorphic 

15 (Fougerousse et al., 1994). A (TA)7(CA)4(GA)i3 was identified in the second 
intron and genotyping of 64 CEPH unrelated individuals revealed two alleles 
(with frequencies of 0.10 and 0.90). The fourth microsatellite is a mixed 
(CA)n(TA)m repeat present in the 9th intron. The latter and the (AT)14 repeat 
have not been investigated for polymorphism. Fourteen repetitive sequences of 

20 the Alu family and one Mer2 repeat were identified in the nCLI gene (Fig. 1C), 
which has. thus, on the average one Alu element per 2.5 kb. 

Southern blot experiments (Ohno et al.. 1989) and STS screening (data not 
shown) suggest that there is but one copy per genome of this member of the 
calpain family. 

25 EXAMPLE 3 

Expression of the nCL1 gene 

The pattern of tissue-specificity was investigated by northern blot 
hybridisation with a genomic subclone probe from cosmid 1F11 spanning exons 
20 and 21 There is no evidence for the existence of an altematively spliced form 
30 Of nCLI, although this cannot be excluded. A transcript of about 3.4-3.6 kb was 



5 



As already noted, the nCLI gene has three unique sequence blocks. NS 
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detected in skeletal muscle mRNA (Figure 5). This size therefore favours that the 
position -544 is the functional TATA box. 

Transcription studies suggested that it is an active gene rather than a 
pseudogene and its muscle-specific pattern of expression is consistent with the 
phenotype of this disorder (Sorimachi et al., 1989 and Figure 5). 

EXAMPLE 4 

Mutation screening 

nCLI fulfils both positional and functional criteria to be a candidate gene for 
LGMD2A. To evaluate its role in the etiology of this disorder, nCLI was 
systematically screened in 38 LGMD2 families for the presence of nucleotide 
changes using a combination of heteroduplex (Keen et al., 1991) and direct 
sequence analyses. 

PGR primers were designed to specifically amplify the exons and splice 
junctions and also the regions containing the putative CAT, TATA boxes and the 
polyadenylation signal of the gene as shown in Table 3. 

lable^ PGR primers used for the analysis of the nGL1 gene in LGMD patients. 



amplified region 



Primer sequences (5-3*) 



promoior 
exon 1 
exon 2 
exon 3 
exon 4 
exon 5 
exon 6 
exon 7 
exon 8 
exon 9 
exon 10 
exon IJ 
exon 12 
exon 13 



Size fbp) 

ri> 



TTCAGTACCTCCCGTTCACC^-^^ %6 
GATGCTTGAGCCAGGAAAAC'^<^|"<S^Z^ : ft> 
CTTTCCTTGAAGGTAGCTGTAl^^ ^^M'^^ 
GAGGTGCTGAGTGAGAGGAO^^/^ ^TD a/^?; j>i) 
ACTCCGTCTCAAAAAAATACCT^^ '^^W'^^ 
ATTGTCCCTTTACCTCCTG XZ> hJo:;i^^ 

TGGAAGTAGGAGAGTGGGCA ^^^ I3>V^4'^^3 

gggtagatgggtgggaagtt"!*^^^ :rd jooijisr 

GAGGAATGTGGAGGAAGGAc C5^6i ^^'^C 
TTCCTGTGAGTGAGGTCTCa^^^Si XZ> /ob: ' 
GGAACTCTGTGACCCCAAATyS^^^ ^^5*^e> 
TCCTCAAACAAAACATTCGC ^ '^^^ 



Annealing 
temp. CO 



3 




AATGGGTTCTCTGGTTACTGC<^^ Xi>3^^'.^^ 

agcacgaaaagcaaagataaa"*^^ xt> ^i^:a^ 

GTAAGAGATTTGCCCCCCAGr^($^ TZ> ^^j 
TCTGCGGATCATTGGTTTTG^C^/£<5l ^ ^S^J 
CCTTCCCTTCTTCCTGCTTc'^-^^ T^j^"' 
CTCTCTTCCCCACCCTTACC'*^^ a76; 3^7 > 
CCTCCTCACCTGCTCCCATa'**^^ 2r2>,AAJ; 3V 
TTTTTCGGCTTAGACCCTCC^'^'^^ :ri>'^<^'3'0 . 
TGTGGGGAATAGAAATAAATGG.^^ 
CCAGGAGCTCTGTGGGTCA?^^^ ^rZ) AJ<?; \ 
GGCTCCTCATCCTCATTCACA<^^<^ '^'^^ 
GTGGAGGAGGGTGAGTGTGC .'t^^d 
TGTGGCAGGACAGGACGTTC <?^e(S? XPsjV''' 



59 



60 



57 



58 



59 



56 



57 



56 



58 



56 



56 



57 



61 



60 
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e.\on 14 

exon 15 

exon 16 

exon 17 

exon 18 

exon 19 
exons 20-21 

exon 22 
exons 22-23 
exon 24 
polyadenylation signal 



14 



TTCAACCTCTGGAGTGGGCC.,,^ ^'p /J^i V<ij> 
CACCAGAGCAAACCGTCCAC.r:^ 230 ^ " 
ACAGCCCAGACTCCCATTCCi?^ '^J^°' ,pJ ,a 
TTCTCTTCTCCCTTCACCC^JCS2^ ^^2^ ' 
ACACACTTCATGCTCTCTACCC^^ "^f^ 
CCGCCTATTCCTTTCCTCTT;^^ X^>Afc33 Lj.N 
GACAAACTCCTGGGAAGCCTC^eS Z^/d-V-A^ 
ACCTCTGACCCCTGTGAACqp^^J? Xt> am>21xI 
TGTGGATTTGTGTGCTACGC^^^ IP 
CATAAATAGCACCGACAGGGA<5iiSp7v,x//>258;^ 
GGGATGGAGAAGAGTGAGGA^^ XoZjs*^ 
TCCTCACTCTTCTCCATCCCjI<^'f/5» ^PA59 ,^ 
ACCCTGTATGTTGCCTTGG^^fesv n:j>A/a:cS^ 
GGGGATTTTGCTGTGTGCTGf -22>>y^j033 
ATTCCTGCTCCCACCGTCTC/^^ . 
CACAGAGTGTCCGAGAGGCA'P'^^' x/^isf 
GGAGATTATCAGGTGAGATGCC^ Xi> 
CAGAGTGTCCGAGAGGCAGGGC5^^ rz> .408, ,^ 

cgttgacccctccaccttg;^ xtK^-?:^^^ 
gggaaaacatgcaccttcttJcs^ ZPiO^yif^ 
TAGGGGGTAAAATGGAGGAG^fiS :C5> >0£>^f^:> 

actaactcagtggaatagggJ-^ ^ 

GGAGCTAGGATAGCTCAAT/<r5£^ TO '0^:<i>?) 



61 



56 



61 



59 



57 



61 



57 



61 



58 



56 



PCR products made on DNA from blood of specific LGMD2A patients were 
then subjected either to heteroduplex analysis or to direct sequencing, 
depending on whether the mutation, based on haplotype analysis, was expected 
to be homozygous or heterozygous, respectively. It was occasionally necessary 
to clone the PCR products to precisely identify the mutations (i.e.. for 
microdeletions or insertions and for some heterozygotes). Disease-associated 
mutations are ^-^^ar^edjn Table A hereunder and their position along the 
protein is shown in Fig. 4. ' 

^^^'^ ^' r^CLI mutations in LGMD2A families. 

Codons and amino acid positions are numbered on the basis of the cDNA 
sequence starting from ATG. 



Exon Families 



2 
4 
4 



B519* 
M42 

M1394: M2888 
M35: M37 



Nucleotide Nuclcoiidc change Amino acid Ammo acid Rcsinciion si 

position ch^no^ 

328 CGA.>TGA 
545 dG -> CAG 

550 CAA .> CA 

701 GGG->CAG 



asition 
110 
182 
184 
234 



Arg->siop 
Lcu.>Gln 
frameshift 
Glv.>Giu 
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M32 




CGG -> CG 


315 


frameshifi 


-Smal 


o 




iOo] 


GTG -> GGG 


354 


VaN> Gly 




8 


M1394 


III /y 


TGG -> TAG 


360 


Trp->siop 


-Bstnl. 


1 ) 


M2SSS 

1 ▼ oo o 


l40o 


CGG -> TGG 


490 


Arg->Trp 




13 


R12* 


1 / 1 ^ 


^^^^ ^ ^ A ^ 
CGG -> CAG 


572 


Arg.>Gln 


-Mspl 


19 


R27 




oeietion AC 


690 


frameshift 




21 




^2^(1 


AOC'>QGC 


744 


Scr.>Gh 


-Alul 


22 


A*: B501* 


: M32 2306 


CGG .> CAG 


769 


Arg->Gin 




22 


B505 


2313-2316 


deletion AGAC 


771.772 


frameshift 




22 


R14:B505 


2362-2363 


AG -> TCATCT 


788 


frameshift 





The first letter of the family code refers to the origin of the population B= Brazil. 
M= metropolitan France. R = Isle of La Reunion, A= Amish. 

Each mutation was confirmed by heteroduplex analysis, by sequencing of 
both strands in several members of the family or by enzymatic digestion when 
the mutation resulted in the modification of a restriction site. Segregation 
analyses of the mutations, performed on DMAs from all available members of the 
families, confirmed that these sequence variations are on the parental 
chromosome carrying the LGMD2A mutation. To exclude the possibility that the 
missense substitutions might be polymorphisms, their presence was 
systematically tested in a control population: none of these mutations was seen 
among 1 20 control chromosomes from the CEPH reference families. 

EXAMPLE 5 

Analysis of families genes, chromosome-1 5 ascertained families 
The initial screening for causative mutations w^s perfonned on families, 
each containing a LGMD gene located on chromosome 15. These included 
families from the Island of La Reunion (Beckmann et al.. 1991). from the Old 
Order Amish from northern Indiana (Young et al.. 1992.) and 2 Brazilian families 
(Passes Bueno et a!.. 1993). 
a) Reunion Island famliiAg 

Genealogical studies and geographic isolation of the families from the Isle 
of La Reunion were suggestive of a single founder effect. Genetic analyses are. 
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however, inconsistent with this hypothesis as the families present haplotype 
heterogeneity. At least, six different carrier chromosomes are encountered, (with 
affected individuals in several families being compound heterozygotes). Distinct 
mutations con-esponding to four of these six haplotypes have been identified 
thus far. 

in family R14, exons 13, 21 and 22 showed evidence for sequence variation 
upon heteroduplex analysis (Fig. 6). Sequencing of the associated PCR products 
revealed (i) a polymorphism in exon 13. (ii) a missense mutation (A->G) in exon 
21 transforming the Ser744 residue to a glycine in the loop of the second EF- 
hand in domain IV of the protein (Figure 4). and (iii) a frameshift mutation in exon 
22. The exon 21 mutation and the polymorphism in exon 13 form an haplotype 
which is also encountered in family R17. Subcloning of the PCR products was 
necessary to identify the exon 22 mutation. Sequencing of several clones 
revealed a replacement of AG by TCATCT (data not shown). This frameshift 
mutation causes premature termination at nucleotide 2400 where an in frame 
stop codon occurs (Figure. 4). 

The affected individuals in family R12 are homozygous for all markers of the 
LGMD2A interval (Allamand. submitted). Sequencing of the PCR products of 
exon 13 revealed a G to A transition at base 1715 of the cDNA resulting in a 
substitution Of glutamine for Arg572 (p.gure. 7) within domain III. a residue which 
IS highly conserved throughout all known calpains. This mutation, detectable by 
loss of Msp\ restnction site, is present only in this family and in no other 
examined LGMD2A families or unrelated controls. 

In family R27. heteroduplex analysis followed by sequencing of the PCR 
products of an affected child revealed a two base pair deletion in exon 19 
(Figure. 6 and table 4). One AC out of three is missing at this position of the 
sequence, producing a stop codon at position 2069 of the cDNA sequence 
(Figure 4). 

b) Amish familipg 

As expected. dueVo multiple consanguineous links, the examined LGMD2A 
Northern Indiana Amis^j patients were homozygous for the haplotype on the 
Chromosome bearing W mutant allele (Allamand. submitted). A (G->A) 
m-ssense mutation was ic^ntified at nucleotide 2306 within exon 22 (Fig 7) The 
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resulting codon change is CGG to CAG, transforming Arg769 to glutamine. This 
residue, which is Conserved throughout all members of the caipain family in all 
species, is located\in domain IV of the protein within the 3rd EF-hand at the 
helix-loop junction (fef). This mutation was encountered in a homozygous state 
in all patients from 12 chromosome 15-linked Amish families, in agreement with 
the haplotype analysil We also screened six Southern Indiana Amish LGMD 
families, for which the ihromosome 15 locus was excluded by linkage analyses 
(Allamand ESHG. submWd, ASHG 94). As expected, this nucleotide change 
was not present in any df the patients from these families, thus confirming the 
genetic heterogeneity of tf^is disease in this genetically related isolate, 
c) Brazilian famili^g 

As a result of consanguineous marriages, two Brazilian families (B501. 
B519) are homozygous for extended LGMD2A carrier haplotypes (data not 
shown). Sequencing PGR products from affected individuals of these families 
demonstrated that family B501 has the same exon 22 mutation found in northern 
Indiana Amish patients (Figure 7). but embedded in a completely different 
haplotype. In family B519. the patients carry a C to T transition in exon 2, 
replacing Arg328 v.ith a TGA stop codon (Figure 7), thus leading, presumably, to 
a very truncated protein (Figure 4). 

^) Analvsis of other LGMp families 

Having validated the role of the candidate gene in the chromosome 15 
ascertained families, we next examined by heteroduplex analysis LGMD families 
for which linkage data were not informative. These included one Brazilian (B505) 
and 13 metropolitan French pedigrees 

Heteroduplex bands were revealed for exons 1, 3. 4. 5. 6 8 11 22 of one 
or more patients (Figure 6). Of all sequence variants, 10 were identified as 
possible pathogenic mutations (5 missense. 1 nonsense and 4 frameshift ■ 
mutations) and 3 as polymorphisms with no change of amino acid of the protein 
All causative mutations identified are listed in Table 4 here-above Identical 
mutations were uncovered in apparently unrelated families. The mutations 
Shared by families M35 and M37, and M2888 and M1394. respectively are likely 
to be the consequence of independent events since they are embedded in 
different marker haplotypes. In contrast, it is likely that the point mutation in exon 
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22 of the Amish and in the M32 kindreds corresponds to the same mutational 
event as both chromosomes share a common four marker haplotype (774G4A1 - 
774G4A10-774G454D-774G4A2) around nCLI (data not shown), possibly 
reflecting a common ancestor. The same holds true for the AG to TCATCT 
substitution mutation encountered in exon 22 in families B505 and R14. The 
exon 8 (T->G) transversion is present in the two carrier chromosomes of M2407. 
the only metropolitan family homozygous by haplotype. possibly reflecting an 
undocumented consanguinity. For some families, no disease-causing mutation 
has been detected thus far (M40 for example). 

In addition to the polymorphism present in exon 13 in families R14 and R17 
(position 668) and in the intragenic microsatellites. four additional neutral 
variations were detected: a (T->C) transition at position 96. abolishing a Ddel 
restriction site in exon 1 in M31 ; a (C->T) transition in exon 3 (position 495) in 
M40 and in M37 forming a haplotype with the exon 5 mutation (in the former 
family, this polymorphism does not cosegregate with the disease): a (T->C) 
transition in the paternally derived promoter in M42 at position -428. which was 
also evidenced in healthy controls; and a variable poly(G) in intron 22 close to 
the splice site in families R20. R11. Ri9, M35 and M37. The latter is also 
present in the members of the CEPH families, but is not useful as a genetic 
marker as the visualisation and interpretation of mononucleotide repeat alleles is 
difficult. 

In total, sixteen independent mutational events representing fourteen 
different mutations were identified. All mutations cosegregate with the disease in 
LGMD2A families. The characterised morbid calpain alleles contain nucleotide 
Changes which were not found in alleles from normal individual. The discovery of 
two nonsense and five frameshift mutations in nCLI supports the hypothesis that 
a deficiency of this product causes LGMD2A. All seven mutations result in a 
premature in-frame stop codon, leading to the production of truncated and 
presumably inactive proteins (Figure 4). Evidences for the morbidity of the 
missense mutations come from (1) the relative high incidence of such mutations 
among LGMD2A patients ; although it is difficult in the absence of functional 
assays to differentiate between a polymorphism and a morbid mutation the 
occurrence of different "missense" mutations in this gene cannot ali be 
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accounted for as rare private polymorphisms: (2) the failure to observe these 
mutations in control chromosomes; and (3) the occurrence of mutations in 
evolutionarily conserved residues and/or in regions of documented functional 
importance. Four of seven missense mutations change an amino acid which is 

5 conserved in all known members of the calpain family in all species (Figure 3). 
Two of the remaining mutations affect less conserved amino acid residues, but 
are located in important functional domains. The substitution V354G in exon 8 is 
4 residues before the asparagine at the active site and S744G in exon 21 is 
within the loop of the second EF-hand and may impair the calcium-dependent 

) regulation of calpain activity or the interaction with a small subunit (Figure 4). 
Several missense mutations change a hydrophobic residue to a polar one. or 
vice versa (Table 4) possibly disrupting higher order structures. 
METHODS 

Descriotion of the patients 

The LGMD2A families analysed were from 4 different geographic origins. 
They included 3 Brazilian families. 13 interrelated nuclear families from the Isle 
of la Reunion, 10 French metropolitan families and 12 US Amish families. The 
majority of these families were previously ascertained to belong to the 
chromosome 15 group by linkage analysis (Beckmann, 1991; Young, Passos- 
Bueno et al., 1993). However, some families from metropolitan France as well as 
one Brazilian family. B505, had non significant lodscores for chromosome 15. 
Genomic DNA was obtained from penpheral blood lymphocytes. 

Sequencing of cosmid c774G4-1 F l l and EcoRl restriction map of cosmids 
Cosmid 1F11 (Figure 1C) was subcloned following DNA preparation through 
Qiagen procedure (Qiagen Inc., USA) and partial digestion with either Sau3A, 
Rsa\ or Alul Size-selected restriction fragments were recovered fom low-melting 
agarose and eventually ligated with Ml 3 or Bluescript (Stratagene. USA) 
vectors. After electroporation in E.co/i, recombinant colonies were picked in 100 
Ml of LB/ampicillin media. PGR reactions were performed on 1 pi of the culture in 
10 mM Ths-HCI. pH 9.0. 50 mM KCI. 1.5 mM MgCI2, 0.1% Triton X-100. 0.01 
gelatine. 200mM of each dNTP. 1 U of Taq Polymerase (Amersham) with 100 ng 
of each vectors primers. Amplification was initiated by 5 min denaturation at 
95'C. followed by 30 cycles of 40 sec denaturation at 92'C and 30 sec annealing 
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at SO'C. PCR products were purified through Microcon devices (Amicon, USA) 
and sequenced using the dideoxy chain termination method on an ABI 
sequencer (Applied Biosystems, Foster City, USA). The sequences were 
analysed and alignments performed using the XBAP software of the Staden 
package, version 93.9 (Staden. 1982). Gaps between sequence contigs were 
filled by walking with internal primers. EcoRl restriction map of cosmids was 
performed essentially as described in Sambrook et al. (1989). 
Northern Blot analysis 

The probes were labelled by random priming with dCTP-(a32p). 
Hybridisation was performed to human multiple tissue northern blots as 
recommended by the manufacturer (Clontech, USA). 

Analysis of PCR products from LGMD2A families 

One hundred ng of human DNA were used per PCR under the buffer and 
cycle conditions described in Fougerousse (1994) (annealing temperature shown 
in Table 3). Heteroduplex analysis (Keene et al.. 1991) was performed by 
electrophoresis of ten pi of PCR products on a 1.5 mm-thick Hydrolink MDE gels 
(Bioprobe) at 500-600 volt for 12-15 h depending of the fragment length. 
Migration profile was visualised under UV after ethidium bromide staining. 

For sequence analysis, the PCR products were subjected to dye-dideoxy 
sequencing, after purification through microcon devices (Amicon, USA). When 
necessary, depending on the nature of the mutations (e.g.. frameshift mutation or 
for some heterozygotes), the PCR products were cloned using the TA cloning kit 
from Invitrogen (UK). One pi of product was ligated to 25 ng of vector at 12»C 
overnight. After electroporation into XLI-blue bacteria, several independent 
clones were analysed by PCR and sequenced as described above. 

The invention results from the finding that the nCLI gene when it is mutated 
is involved in the etiology of LGMD2A. It is exactly the contrary to what is stated 
in the litterature. e.g. that the disease is accompanied by the presence of a 
deregulated calpain. Identification of nCLI as the defective gene in LGMD2A 
represents the first example of muscular dystrophy caused by mutation affecting 
a gene which is not a structural component of muscle tissue, in contrast with 
previously identified muscular dystrophies such as Duchenne and Becker 
(Bonilla et al.. 1988), severe childhood autosomal recessive (Matsumara et al.. 



wo 96/16175 




PCT/EP95/(M575 



21 

1992), Fukuyama (Matsumara et al.. 1993) and merosin-deficient congenital 
muscular dystrophies (Tome et al., 1994). 

The understanding of the LGMD2A phenotype needs to take into account 
the fact that there is no active nCLI protein in several patients, a loss compatible 
5 with the recessive manifestation of this disease. Simple models in which this 
protease would be involved in the degradation or destabilisation of structural 
components of the cytoskeleton, extracellular matrix or dystrophin complex can 
therefore be ruled out. Furthermore, there are no signs of such alterations by 
immunocytogenetic studies on LGMD2 muscle biopsies (Matsumara et al.. 1993; 
Tome et a!., 1994). Likewise, since LGMD2A myofibers are apparently not 
different from other dystrophic ones, it seems unlikely that this calpain plays a 
role in myoblast fusion, as proposed for ubiquitous calpains (Wang et al.. 1989). 

All the data disclosed in these examples confirm that the nCLI gene is a 
major gene involved in the disease when mutated. 

The fact that morbidity results from the loss of an enzymatic activity raises 
hopes for novel pharmaco-therapeutic prospects. The availability of transgenic 
models will be an invaluable tool for these investigations. 

The invention is also relative to the use of a nucleic acid or a sequence of 
nucleic acid of the invention, or to the use of a protein coded by the nucleic acid 
for the manufacturing of a drug in the prevention or treatment of LGMD2. 

The finding that a defective calpain underlies the pathogenesis of LGMD2A 
may prove useful for the identification of the other loci involved in the LGMDs. 
Other forms of LGMD may indeed be caused by mutations in genes whose 
products are the CANP substrates or in genes involved in the regulation of nCLI 
expression. Techniques such as the two-hybrid selection system (Fields et al., 
1 989) could lend themselves to the isolation of the natural protein substrate(s) of 
this calpain. and thus potentially help to identify other LGMD loci. 

The invention also relates to the use of all or a part of the peptidic sequence 
of the enzyme, or of the enzyme, product of nCLI gene, for the screening of the 
ligands of this enzyme, which might be also involved in the etiology and the 
morbidity of LGMD2 

The ligands which might be involved are for example substrate(s). activators 
or inhibitors of the enzyme. 
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The nucleic acids of the invention might also be used in a screening method 
for the detemnination of the components which may act on the regulation of the 
gene expression. 

A process of screening using either the enzyme or a host recombinant cell, 
containing the nCL1 gene and expressing the enzyme, is also a part of the 
invention. 

The pharmacological methods, and the use of nucleic acid and peptidic 
sequences of the invention are very potent applications. 

The methods used for such screenings of ligands or regulatory elements are 
those described for example for the screening of ligands using cloned receptors. 

The identification of mutations in the nCL1 gene provides the means for 
direct prenatal or presymptomatic diagnosis and earner detection in families in 
which both mutations have been identified. Gene-based accurate classification 
of LGMD2A families should prove useful for the differential diagnosis of this 
disorder. 

The invention relates to a method of detection of a predisposition to LGMD2 
in a family or a human being, such method comprising the steps of : 

- selecting one or more exons or flanking sequences which are sensitive in 
said family; 

- selecting the primers specific for the or these exons or their flanking 
sequences, a specific example being the PGR primers of Table 3, or an hybrid 
thereof, 

- amplifying the nucleic acid sequence, the substrate for this amplification 
being the DNA of the human being to be checked for the predisposition, and 

- comparing the amplified sequence to the corresponding sequence derived 
from Figure 2 or Figure 8. 

Table 2 indicates the sequences of the introns-exons junctions, and primers 
comprising in their structure these junctions are also included in the invention. 

All other phmers suitable for such RNA or DNA amplification may be used in 
the method of the invention. 

In the same way. any suitable amplification method : PGR (for Polymerase 
Chain Reaction ®) NASBA ® (for Nucleic acid Sequence Based Amplification). 

or others might be used. 
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The methods usually used in the detection of one site mutations, like ASQ 
(Allele specific PGR), LCR, or ARMS (Amplification Refactory Mutation System) 
may be implemented with the specific primers of the invention. 

The primers, such as described in Tables 1 and 3. or including junctions of 
Table 2, or more generally including the flanking sequences of one of the 24 
exons are also a part of the invention. 

The kit for the detection of a predisposition to LGMD2 by nucleic acid 
amplification is also in the scope of the invention, such a kit comprises a least 
PGR primers selected from the group of : 

a) in those described in table 1 

b) in those described in table 3 

c) those including the introns-exons junctions of Table 2. 

d) denved from primers defined in a),b) or c). 

The nucleic acid sequence of claim 1 to 3 might be inserted in a viral or a 
retroviral vector, said vector being able to transfect a packaging cell line. 

The packaging transfected cell line, might be used as a drug for gene 
therapy of LGMD2. 

The treatment of LGMD2 disease by gene therapy is implemented by a 
pharmaceutical composition containing a component selected from the group of : 

a) a nucleic acid sequence according to claims 1 to 4. 

b) a cell line according to claim 24, 

c) an aminoacid sequence according to claims 5 to 9. 
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